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DETAILED ACTION 

1 . The following action is a response to the after final communications of 
04/03/2007. Claims 18, 24-26, and 31-33 have been amended by examiner's amendment. 
Further, claims 27-29 have been canceled and claim 35 has been added by examiner's 
amendment. Claims 18-26 and 30-35 are now pending in this application and are 
allowed. This action includes an examiner's amendment and reasons for allowance 

Examiner's Amendment 

2. An examiner's amendment to the record appears below. Should the changes be 
unacceptable to the applicant, an amendment may be filed as provided by 37 CFR 1.312. 
To ensure consideration of such an amendment, it MUST be submitted no later than the 
payment of the issue fee. 

Authorization for this examiner's amendment was given in a telephone interview 
with Mr. Michael Dunnam on April 3, 2007. The application has been amended as 
follows: 

In the claims: 

Please amend claim 18 as follows: 

18. A method of controlling a system to optimize an objective function 
thereof, the system b e ing capabl e of performing a plurality of candidate actions and b e ing 
capabl e of monitoring response performances of a performance of a respective candidate 
action, where the objective function is a function of the monitored response performances 
following decisions and actions taken, the method comprising the steps of: 

a) monitoring response performance of a respective candidate action that is 
chosen to be performed by the system; 
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b) storing, according to the candidate action performed by the system, a 
representation of said monitored response performance; 

c) calculating the expected growth in regret associated with each of the plurality 
of candidate actions, assessed using a probability distribution based on the historical 
response performances to date of said plurality of candidate actions, where the expected 
growth in regret is a system performance measure that is calculated to represent the trade- 
off between the relative merit of exploration of one or more apparently non-best 
candidate actions to mitigate the risk of ignoring one of said one or more apparently non- 
best candidate actions which may actually be the current best candidate action, with 
respect to the relative merit of exploiting what appears to be the current best candidate 
action but which in fact may not be the current best candidate action, based on said 
historical response performances to date: 

ed) choosing as the next action which of the plurality of candidate actions that is 
calculated to result in the lowest expected growth in regret after the chosen candidate 
action is performed n e xt p e rformed by the system so as to optimiz e said obj e ctive 
function by ass e ssing, using the probability distribution of the response performance of 
all of said plurality of candidate actions, which candidat e action is e stimat e d to r e sult in 
th e low e st e xp e ct e d growth in regr e t after th e chosen candidate action is performed by the 
syst e m ; 

de) commanding the system to perform the chosen next candidat e action 
id e ntifi e d to be the n e xt p e rform e d in stop c) ; and 

ef) repeating steps a) to de) to control the system so as to substantially optimize 
the objective function of the systemf 

where r e gr e t is a t e rm that r e pr e sents a system p e rformanc e m e asur e that 
consid e rs th e r e lativ e merit of e xploration of on e or mor e appar e ntly non best candidat e 
actions, with r e spect to th e r e lativ e m e rit of e xploiting what app e ars to b e th e current best 
candidat e action based on historical r e spons e performanc e s to date . 

Please amend claim 24, line 3, as follows: 

fg) applying a temporal depreciation factor to the stored representations of the 
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Please amend claim 25, line 1, as follows: 

25. A method according to claim 24 wherein step fg) 

Please amend claim 26, line 3, as follows: 

fg) forcing the performance of each candidate action a minimum number of times 

Please cancel claims 27-29. 

Please amend claim 31 as follows: 

3 L A system having means for performing a plurality of candidate actions and 
means for monitoring response performances of a performance of a respective candidate 
action during performance of an objective function of the system, where the objective 
function is a function of the monitored response performances following decisions and 
actions taken, the system further having a control apparatus that is programmed to control 
the objective function of the system according to th e m e thod of claim 18 by performing 
the method comprising the steps of: 

a) monitoring response performance of a respective candidate action that is 
chosen to be performed by the system; 

b) storing, according to the candidate action performed by the system, a 
representation of said monitored response performance: 

c) calculating the expected growth in regret associated with each of the plurality 
of candidate actions, assessed using a probability distribution based on the historical 
response performances to date of said plurality of candidate actions, where the expected 
growth in regret is a system performance measure that is calculated to represent the trade- 
off between the relative merit of exploration of one or more apparently non-best 
candidate actions to mitigate the risk of ignoring one of said one or more apparently non- 
best candidate actions which may actually be the current best candidate action, with 
respect to the relative merit of exploiting what appears to be the current best candidate 
action but which in fact may not be the current best candidate action, based on said 
historical response performances to date; 
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d) choosing as the next action the candidate action that is calculated to result in 
the lowest expected growth in regret after the chosen candidate action is performed; 

e) commanding the system to perform the chosen next action: and 

f) repeating steps a) to e) to control the system so as to substantially optimize the 
objective function of the system . 

Please amend claim 32 as follows: 

32. A robot comprising the system according to claim 3 1 , where the syst e m 
compris e s a robo t control apparatus of the system controls the objective function of the 
robot so as to optimize the objective function of the robot . 

Please amend claim 33 as follows: 

33. A control apparatus for controlling a system to optimize an objective 
function thereof, the system b e ing capabl e of performing a plurality of candidate actions 
and b e ing capable of monitoring response performances of a performance of a respective 
candidate action; where the objective function is a function of the monitored response 
performances following decisions and actions taken, the control apparatus comprising 
programm e d to perform the stops of : 

a) means for monitoring response performance of a respective candidate action 
that is chosen to be performed by the system; 

b) means for storing, according to the candidate action performed by the system, a 
representation of said monitored response performance; 

c) means for calculating the expected growth in regret associated with each of the 
plurality of candidate actions, assessed using a probability distribution based on the 
historical response performances to date of said plurality of candidate actions, where the 
expected growth in regret is a system performance measure that is calculated to represent 
the trade-off between the relative merit of exploration of one or more apparently non-best 
candidate actions to mitigate the risk of ignoring one of said one or more apparently non- 
best candidate actions which may actually be the current best candidate action, with 
respect to the relative merit of exploiting what appears to be the current best candidate 
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action but which in fact may not be the current best candidate action, based on said 
historical response performances to date: and 

ed) means for choosing as the next action which of the plurality of candidate 
actions that is calculated to result in the lowest expected growth in regret after the chosen 
candidate action is performed n e xt p e rform e d by the system so as to optimiz e said 
obj e ctiv e function by assessing, using th e probability distribution of the r e spons e 
p e rformance of all of said plurality of candidate actions, which candidat e action is 
e stimat e d to r e sult in th e low e st expected growth in r e gr e t aft e r th e chos e n candidate 
action is perform e d by th e syst e m ; 

de) means for commanding the system to perform the chosen next candidat e 
action^ id e ntifi e d to b e th e n e xt perform e d in st e p c); and 

e r e p e ating st e ps a) to d to control wherein the control apparatus controls the 
system so as to substantially optimize the objective function of the system^ 

wh e r e r e gr e t is a term that r e pr e s e nts a syst e m p e rformance m e asur e that 
consid e rs the r e lativ e m e rit of e xploration of one or mor e appar e ntly non b e st candidat e 
actions, with r e sp e ct to th e r e lativ e m e rit of e xploiting what app e ars to b e th e curr e nt b e st 
candidat e action bas e d on historical respons e p e rformances to date . 

Please add claim 35 as follows: 

35. A method of controlling a system with two or more subsystems to 
optimize an objective function of the system, the system performing a plurality of 
candidate actions, wherein a candidate action is represented by the selection of a lower 
level subsystem from said two or more subsystems, and wherein the system monitors the 
response performance of the selected subsystem, where the objective function is a 
function of the monitored response performances following decisions and actions taken, 
the method comprising the steps of: 

a) monitoring response performance of a respective candidate action that is 
chosen to be performed by the system; 
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b) storing, according to the candidate action performed by the system, a 
representation of said monitored subsystem performance in response to the candidate 
action; 

c) calculating the expected growth in regret associated with each of the plurality 
of candidate actions, assessed using a probability distribution based on the historical 
response performances to date of said plurality of candidate actions, where the expected 
growth in regret is a system performance measure that is calculated to represent the trade- 
off between the relative merit of exploration of one or more apparently non-best 
candidate actions to mitigate the risk of ignoring one of said one or more apparently non- 
best candidate actions which may actually be the current best candidate action, with 
respect to the relative merit of exploiting what appears to be the current best candidate 
action but which in fact may not be the current best candidate action, based on said 
historical response performances to date; 

d) choosing as the next action the candidate action that is calculated to result in 
the lowest expected growth in regret after the chosen candidate action is performed by the 
system; 

e) commanding the system to perform the chosen next action using a 
corresponding lower level subsystem; and 

f) repeating steps a) to e) to control the system so as to substantially optimize the 
objective function of the system. 

Reasons for Allowance 

3. Claims 18-26 and 30-35 are allowed. 

4. The following is an examiner's statement of reasons for allowance: None of the 
prior art of record, taken individually or in any combination, teach, inter alia, iteratively 
repeating the steps of calculating the expected growth in regret associated with each of 
the plurality of candidate actions as a trade-off between the relative merit of exploration 
of one or more apparently non-best candidate actions to mitigate the risk of ignoring a 
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non-best candidate actions which may actually be the current best candidate action, with 
respect to the relative merit of exploiting what appears to be the current best candidate 
action but which in fact may not be the current best candidate action and then choosing 
and performing a next candidate action that is calculated to result in the lowest expected 
growth in regret after the chosen candidate action is performed. 

The prior art references most closely resembling the Applicant's claimed 
invention are Merriman et al. (U.S. 2002/0099600), Eppen et al. (Quantitative Concepts 
for Management) , Masch (U.S. 5,930,762), McClave et al. ( A First Course in Business 
Statistics ). Jameson (U.S. 6,032,123), Strickland et al (U.S. 5,790,407) 

Merriman et al. teaches that the response performance to a candidate action 
(advertising) is monitored and stored in the historical database of the system. Based on 
the knowledge gained and stored concerning response performance, a next action (ad) is 
chosen to be performed by the system to optimize an objective function by assessing, 
using a predictive model, empirical data to determine which action will maximize 
feedback/minimize economic loss after the chosen candidate action is performed based 
on historical response performances to date by the system. This is an iterative process, 
where the model is refined over time. However, Merriman et al. does not explicitly 
disclose that this minimizing of economic loss is in terms of regret or calculating the 
expected growth in regret as a trade-off between the relative merit of exploration of one 
or more apparently non-best candidate actions with respect to the relative merit of 
exploiting what appears to be the current best candidate action. 

Eppen et al. teaches the concept of regret theory utilized in economic and decision 
theory, where each possible decision (or action) has associated states of nature 
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(outcomes). Therefore, in each set of possible decisions (or actions), an action is 
associated with an apparently best outcome and another action is associated with an 
apparently non-best outcome. They are the apparent best and non-best outcomes because 
they have not yet occurred. Epen et al. disclose the generation of a regret table where 
regret is expressed as a performance measure, and the table shows the merit (i.e. value, 
advantage, worth) of exploring a non-best action (decision) versus the merit of exploiting 
a best action (decision), as represented by the numbers in the table that reflect 
opportunity cost/loss. Finally, Eppen et al. specifically discusses that when the decision 
maker/software knows the probability distribution on the state of nature, regret can be 
minimized. However, Eppen et al. does not explicitly disclose calculating the expected 
growth in regret as a trade-off between the relative merit of exploration of one or more 
apparently non-best candidate actions to mitigate the risk of ignoring a non-best 
candidate actions which may actually be the current best candidate action, with respect to 
the relative merit of exploiting what appears to be the current best candidate action but 
which in fact may not be the current best candidate action. 

Masch discloses a computer system that aides a user in making decisions under 
conditions of uncertainty. Masch uses models to find, analyze, and fine-tune tradeoffs 
and to construct alternative candidate strategies. A constructed strategy is then selected 
for implementation in the physical system based on decision analysis including the use of 
regret methods and matrices. Regret-based methods compare candidate strategies in an 
attempt to minimize the negative impact of uncertainty. The comparison leads to the 
output of a regret matrix that represents the payoffs of potential outcomes of each 
candidate strategy as well as a regret estimate that represents the opportunity loss of the 
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strategy if it turns out to not be the best scenario. Therefore, while Masch disclose using 
regret analysis to choose a candidate strategy to implement in a system and a chosen 
strategy turning out to not be the best (in terms of regret), Masch does not explicitly teach 
a tradeoff calculation that includes the merit of exploration of a non-best candidate action 
based on the risk of it actually being the best candidate action after its performance. In 
all of Masch's examples, Masch always chooses the candidate strategy that is expected to 
result in the lowest regret based on currently known data (before an action is performed). 
Thus, Masch does not explicitly disclose iteratively repeating the steps of calculating the . 
expected growth in regret as a trade-off between the relative merit of exploration of one 
or more apparently non-best candidate actions to mitigate the risk of ignoring a non-best 
candidate actions which may actually be the current best candidate action, with respect to 
the relative merit of exploiting what appears to be the current best candidate action but 
which in fact may not be the current best candidate action. 

Finally, McClave et al., Jameson, and Strickland et al. teach various features of 
the claimed invention. These prior arts teach determining the sample size needed to 
make reliable decisions, using a Monte Carlo algorithm to provide understanding using 
"what if simulation, and controlling external devices, such as robots, by comparing the 
response profile of the device to the actual response of the device, respectively. 
However, none of McClave et al., Jameson, and Strickland et al. disclose regret 
calculations or calculating the expected growth in regret as a trade-off between the 
relative merit of exploration of one or more apparently non-best candidate actions with 
respect to the relative merit of exploiting what appears to be the current best candidate 
action. 
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Any comments considered necessary by applicant must be submitted no later than 
the payment of the issue fee and, to avoid processing delays, should preferably 
accompany the issue fee. Such submissions should be clearly labeled "Comments on 
Statement for Reasons for Allowance". 

Conclusion 

The prior art made of record and not relied upon is considered pertinent to 
applicant's disclosure. 

Dembo (U.S. 5,148,365) discloses scenario optimization and using probability 
values to determine the expectancy that a scenario will occur. 

Masch (WO 98/13776) discloses candidate strategy considerations using regret 

theory. 

Any inquiry concerning this communication or earlier communications from the 
examiner should be directed to Beth Van Doren whose telephone number is 571-272- 
6737. The examiner can normally be reached on M-F, 8:00-5:00. 

. If attempts to reach the examiner by telephone are unsuccessful, the examiner's 
supervisor, Tariq Hafiz can be reached on 571-272-6729. The fax phone number for the 
organization where this application or proceeding is assigned is 571-273-8300. 
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Information regarding the status of an application may be obtained from the 
Patent Application Information Retrieval (PAIR) system. Status information for 
published applications may be obtained from either Private PAIR or Public PAIR. Status 
information for unpublished applications is available through Private PAIR only. For 
more information about the PAIR system, see http://pair-direct.uspto.gov. Should you 
have questions on access to the Private PAIR system, contact the Electronic Business. 
Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO 
Customer Service Representative or access to the automated information system, call 
800-786-9199 (IN USA OR CANADA) or 571-272-1000. 



bvd 

April 4, 2007 




